A Trainable Visually-grounded Spoken Language Generation System
نویسنده
چکیده
A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell’ procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using these structures, a planning algorithm integrates syntactic, semantic, and contextual constraints to generate natural and unambiguous descriptions of objects in novel scenes. The output of the generation system is synthesized using word-based concatenative synthesis drawing from the original training speech corpus. In evaluations of semantic comprehension by human judges, the performance of automatically generated spoken descriptions was comparable to human generated descriptions.
منابع مشابه
A trainable spoken language understanding system for visual object selection
We present a trainable, visually-grounded, spoken language understanding system. The system acquires a grammar and vocabulary from a “show-and-tell” procedure in which visual scenes are paired with verbal descriptions. The system is embodied in a table-top mounted active vision platform. During training, a set of objects is placed in front of the vision system. Using a laser pointer, the system...
متن کاملGrounding Natural Spoken Language Semantics in Visual Perception and Motor Control
A characteristic shared by most approaches to natural language understanding and generation is the use of symbolic representations of word and sentence meanings. Frames and semantic nets are two popular current approaches. Symbolic methods alone are inadequate for applications such as conversational robotics that require natural language semantics to be linked to perception and motor control. T...
متن کاملEvaluating a Trainable Sentence Planner for a Spoken Dialogue System
Techniques for automatically training modules of a natural language generator have recently been proposed, but a fundamental concern is whether the quality of utterances produced with trainable components can compete with hand-crafted template-based or rulebased approaches. In this paper We experimentally evaluate a trainable sentence planner for a spoken dialogue system by eliciting subjective...
متن کاملLearning visually grounded words and syntax for a scene description task
A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell" procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using ...
متن کاملA trainable generator for recommendations in multimodal dialog
As the complexity of spoken dialogue systems has increased, there has been increasing interest spoken language generation (SLG). SLG promises portability across application domains and dialogue situations through the development of applicationindependent linguistic modules. However in practice, rulebased SLGs often have to be tuned to the application. Recently, a number of research groups have ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002